P-value-based regulatory motif discovery using positional weight matrices.
نویسندگان
چکیده
To analyze gene regulatory networks, the sequence-dependent DNA/RNA binding affinities of proteins and noncoding RNAs are crucial. Often, these are deduced from sets of sequences enriched in factor binding sites. Two classes of computational approaches exist. The first describe binding motifs by sequence patterns and search the patterns with highest statistical significance for enrichment. The second class uses the more powerful position weight matrices (PWMs). Instead of maximizing the statistical significance of enrichment, they maximize a likelihood. Here we present XXmotif (eXhaustive evaluation of matriX motifs), the first PWM-based motif discovery method that can optimize PWMs by directly minimizing their P-values of enrichment. Optimization requires computing millions of enrichment P-values for thousands of PWMs. For a given PWM, the enrichment P-value is calculated efficiently from the match P-values of all possible motif placements in the input sequences using order statistics. The approach can naturally combine P-values for motif enrichment, conservation, and localization. On ChIP-chip/seq, miRNA knock-down, and coexpression data sets from yeast and metazoans, XXmotif outperformed state-of-the-art tools, both in numbers of correctly identified motifs and in the quality of PWMs. In segmentation modules of D. melanogaster, we detect the known key regulators and several new motifs. In human core promoters, XXmotif reports most previously described and eight novel motifs sharply peaked around the transcription start site, among them an Initiator motif similar to the fly and yeast versions. XXmotif's sensitivity, reliability, and usability will help to leverage the quickly accumulating wealth of functional genomics data.
منابع مشابه
The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences
The discovery of regulatory motifs enriched in sets of DNA or RNA sequences is fundamental to the analysis of a great variety of functional genomics experiments. These motifs usually represent binding sites of proteins or non-coding RNAs, which are best described by position weight matrices (PWMs). We have recently developed XXmotif, a de novo motif discovery method that is able to directly opt...
متن کاملA highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information
Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. However, most established methods are designed for small sample sizes and are inefficient for ChIP data. Here we propose a new k-mer occurrence model to reflect the fact that functional DNA k-mers often cluster around ChI...
متن کاملRegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach
RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute ...
متن کاملMACRO-PERFECTOS-APE — MAtrix CompaRisOn & PrEdicting Regulatory Functional Effect of SNPs by Approximate P-value Estimation
Here we present MACRO-APE and PERFECTOS-APE software designed for practical sequence analysis involving classic mononucleotide and dinucleotide position weight matrices (PWMs) of DNA sequence patterns often called motifs. The common usage case for DNA motifs is representation of transcription factor binding sites. The software allows (1) comparing different PWMs using a variant of Jaccard simil...
متن کاملDNA-MATRIX a tool for DNA motif discovery and weight matrix construction
In computational molecular biology, gene regulatory binding sites prediction in whole genome remains a challenge for the researchers. Now a days, the genome wide regulatory binding site prediction tools required either direct pattern sequence or weight matrix. Although there are known transcription factor binding sites databases available for genome wide prediction but no tool is available whic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Genome research
دوره 23 1 شماره
صفحات -
تاریخ انتشار 2013